{smcl}
{* 26 Feb 2002/16 Sept 2008/28 Oct 2008/14 Mar 2012}{...}
{hline}
{cmd:help distinct}{right: ({browse "http://www.stata-journal.com/article.html?article=dm0042_1":SJ12-2: dm0042})}
{hline}

{title:Title}

{p2colset 5 17 19 2}{...}
{p2col: {hi:distinct} {hline 2}}Report number(s) of distinct observations or values{p_end}
{p2colreset}{...}


{title:Syntax}

{p 8 17 2}{cmd:distinct} [{varlist}]
{ifin}
[{cmd:,} {opt miss:ing}
{opt a:bbrev(#)} 
{opt j:oint}
{opt min:imum(#)}
{opt max:imum(#)}
]  

{p 4 4 2}{cmd:by} is allowed; see {manhelp by D}.


{title:Description}

{p 4 4 2}The {cmd:distinct} command displays the number of distinct
observations with respect to the variables in {varlist}.  By default, each
variable is considered separately so that the number of distinct observations
for each variable is reported; the number of distinct observations is the same
as the number of distinct values.  Optionally, variables can be considered
jointly so that the number of distinct groups defined by the values of
variables in {it:varlist} is reported. 

{p 4 4 2}By default, missing values are not counted.  {it:varlist} may
contain both numeric and string variables.  


{title:Options} 

{p 4 4 2}{cmd:missing} specifies that missing values are to be included
in counting distinct observations. 

{p 4 4 2}{opt abbrev(#)} specifies that variable names are to be
displayed abbreviated to at most {it:#} characters.  This option has no
effect with {cmd:joint}. 

{p 4 4 2}{cmd:joint} specifies that distinctness is to be determined
jointly for the variables in {it:varlist}. 

{p 4 4 2}{opt minimum(#)} specifies that numbers of distinct values are to be displayed only if they are 
equal to or greater than a specified minimum. 

{p 4 4 2}{opt maximum(#)} specifies that numbers of distinct values are to be displayed only if they are 
less than or equal to a specified maximum. 


{title:Remarks} 

{p 4 4 2}Distinctness, duplication, and uniqueness are different aspects
of the similarity and difference of observations.  Suppose the values of some
variable are 1, 2, 2, 3, 3, 3, 4, 4, 4, 4.  Then there are four distinct
values: 1, 2, 3, and 4.  Alternatively, there are, so far as this variable is
concerned, four distinct observations because, for example, the second and
third observations both containing the value 2 are identical in respect to this
variable.  Of these values, 2, 3, and 4 are duplicated in the data, meaning
that each occurs twice or more.  Some people refer to the distinct values as
unique values, even though in general distinct values could all be repeated in
the data.  One logic behind that terminology is that if you remove all
duplicates from these data then you are left with four distinct values, each of
which occurs once. 

{p 4 4 2}Now consider distinctness determined jointly for two variables.
Suppose observations are 1 and {cmd:"a"}, 2 and {cmd:"b"}, 2 and {cmd:"b"}, 3
and {cmd:"c"}, 3 and {cmd:"c"}, 3 and {cmd:"d"}, 4 and {cmd:"c"}, 4 and
{cmd:"c"}, 4 and {cmd:"d"}, 4 and {cmd:"d"}.  Then, as far as these two
variables are concerned, there are six distinct observations, 1 and {cmd:"a"},
2 and {cmd:"b"}, 3 and {cmd:"c"}, 3 and {cmd:"d"}, 4 and {cmd:"c"}, 4 and
{cmd:"d"}.  Considering the variables individually, there are four distinct
values for the first variable and four for the second.  Clearly, the same
principles of considering variables individually and jointly extend to three or
more variables. 


{title:Saved results}

{pstd}{cmd:distinct} saves the following in {cmd:r()}:

{synoptset 14 tabbed}{...}
{p2col 5 14 18 2: Scalars}{p_end}
{synopt:{cmd:r(ndistinct)}}distinct count (for last variable, 
           or jointly considered group of variables, 
           and, if specified, last {cmd:by} group){p_end}
{synopt:{cmd:r(N)}}number of observations (for last variable,
           or jointly considered group of variables, 
           and, if specified, last {cmd:by} group){p_end}
{p2colreset}{...}


{title:Examples}

{p 4 4 2}{cmd:. sysuse auto}{p_end}
{p 4 4 2}{cmd:. distinct} {p_end}
{p 4 4 2}{cmd:. distinct, max(10)} {p_end}
{p 4 4 2}{cmd:. distinct make-headroom}{p_end}
{p 4 4 2}{cmd:. distinct make-headroom, missing abbrev(6)}{p_end}
{p 4 4 2}{cmd:. distinct foreign rep78, joint}{p_end}
{p 4 4 2}{cmd:. distinct foreign rep78, joint missing}


{title:Authors}

{p 4 4 2}Gary Longton, Fred Hutchinson Cancer Research Center, USA{break}
glongton@fhcrc.org

{p 4 4 2}Nicholas J. Cox, Durham University, UK{break} 
n.j.cox@durham.ac.uk


{title:Acknowledgment}

{p 4 4 2}This program grew out of one originally posted to Statalist by
Patrick Royston for Stata 4.


{title:Also see}

{psee}
Article: {it:Stata Journal}, volume 8, number 4: {browse "http://www.stata-journal.com/article.html?article=dm0042":dm0042}

{psee}
Online:
{manhelp codebook D}, 
{manhelp contract D}, 
{manhelp duplicates D}, 
{manhelp egen D},
{manhelp inspect D},
{manhelp isid D},
{manhelp levelsof P},  
{help tabulate}, 
{help groups} (if installed) 
{p_end}
